Placing Objects in Gesture Space: Toward Incremental Interpretation of Multimodal Spatial Descriptions
نویسندگان
چکیده
When describing routes not in the current environment, a common strategy is to anchor the description in configurations of salient landmarks, complementing the verbal descriptions by “placing” the non-visible landmarks in the gesture space. Understanding such multimodal descriptions and later locating the landmarks from real world is a challenging task for the hearer, who must interpret speech and gestures in parallel, fuse information from both modalities, build a mental representation of the description, and ground the knowledge to real world landmarks. In this paper, we model the hearer’s task, using a multimodal spatial description corpus we collected. To reduce the variability of verbal descriptions, we simplified the setup to use simple objects as landmarks. We describe a real-time system to evaluate the separate and joint contributions of the modalities. We show that gestures not only help to improve the overall system performance, even if to a large extent they encode redundant information, but also result in earlier final correct interpretations. Being able to build and apply representations incrementally will be of use in more dialogical settings, we argue, where it can enable immediate clarification in cases of mismatch.
منابع مشابه
A Corpus of Natural Multimodal Spatial Scene Descriptions
We present a corpus of multimodal spatial descriptions, as commonly occurring in route giving tasks. Participants provided natural spatial scene descriptions with speech and abstract deictic/iconic hand gestures. The scenes were composed of simple geometric objects. While the language denotes object shape and visual properties (e.g., colour), the abstract deictic gestures “placed” objects in ge...
متن کاملInterpretation of Multimodal Designation with Imprecise Gesture
We are interested in multimodal systems that use the following modes and modalities: speech (and natural language) as input as well as output, gesture as input and visual as output using screen displays. The user exchanges with the system by gesture and/or oral statements in natural language. This exchange, encoded in the different modalities, carries the goal of the user and also the designati...
متن کاملLogo and Cover Credits
In the description of object shapes, humans usually performiconic gestures that coincide with speech (are coverbal). Marked by asimilarity between the gestural sign and the described object, iconic ges-tures may easily depict content difficult to describe using words alone.Though the expressive potential of iconic gestures in human-computercommunication is generally acknowle...
متن کاملWhat and Where: An Empirical Investigation of Pointing Gestures and Descriptions in Multimodal Referring Actions
Pointing gestures are pervasive in human referring actions, and are often combined with spoken descriptions. Combining gesture and speech naturally to refer to objects is an essential task in multimodal NLG systems. However, the way gesture and speech should be combined in a referring act remains an open question. In particular, it is not clear whether, in planning a pointing gesture in conjunc...
متن کاملIncremental Generation of Multimodal Deixis Referring to Objects
This paper describes an approach for the generation of multimodal deixis to be uttered by an anthropomorphic agent in virtual reality. The proposed algorithm integrates pointing and definite description. Doing so, the context-dependent discriminatory power of the gesture determines the contentselection for the verbal constituent. The concept of a pointing cone is used to model the region single...
متن کامل